-
Notifications
You must be signed in to change notification settings - Fork 167
Add a new configuration option to control table header row inference #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new configuration option to control table header row inference #161
Conversation
@SomeBottle - thanks for the pull request! I was not aware of this behavior, and I would want your proposed behavior in our own application. Thanks for catching this and proposing a solution. I have some suggestions:
@AlexVonB, @matthewwithanm - my preference would be to not infer |
yeah, that behavior is surprising to me |
Thank you for your response! I've renamed the option and removed Should I change the default value of the option to |
@SomeBottle - yes, please change the default to not infer header rows from data rows unless specified by the user. |
OK, I just set the default value of the option to |
@SomeBottle - thanks! Looking through your code, I see this: is_headrow = (
all([cell.name == 'th' for cell in cells])
or (el.parent.name == 'thead'
# avoid multiple tr in thead # <----
and len(el.parent.find_all('tr')) == 1) # <----
) What is the thinking behind the check for multiple |
Yes, considering the possibility of such an unusual situation, which is actually an invalid header for markdown table, I think it should be treated as a header missing case, and the rows should be regarded as data rows. |
@SomeBottle - could you rebase/merge this branch to the latest state of |
bfb1c35
to
123fdaa
Compare
@chrispy-snps OK, I've rebased the branch. |
@SomeBottle - thanks for your contribution! |
…matthewwithanm#161) Add option to infer first table row as table header (defaults to false)
For the instance that a table does not contain a header defined by
<thead>
or<th>
, markdownify will use the first row of the table as header fallback by default.However, sometimes I expect the first row to be parsed as a part of the tbody, for example:
I expect it to be converted as follows:
I think I may not be the only one who has encountered this problem, so I added a new option
table_header_fallback
to control this behavior.table_header_fallback
is set toTrue
by default, and the html above will be converted to:If I set
table_header_fallback=False
, the result will be:I've added this option to the ArgumentParser in
main.py
(alongwith another--escape-misc
that may be forgotten), and I also created corresponding test cases.Hope this will help, thank you for your work!